Method and apparatus for higher order ambisonics encoding and decoding using singular value decomposition

ABSTRACT

The encoding and decoding of HOA signals using Singular Value Decomposition includes forming ( 11 ) based on sound source direction values and an Ambisonics order corresponding ket vectors (|Y(Ω5))) of spherical harmonics and an encoder mode matrix (Ξ 0χs ). From the audio input signal (|χ(Ω s ))) a singular threshold value (σ ε ) determined. On the encoder mode matrix a Singular Value Decomposition ( 13 ) is carried out in order to get related singular values which are compared with the threshold value, leading to a final encoder mode matrix rank ( r fin e ). Based on direction values (Ω l ) of loudspeakers and a decoder Ambisonics order (N l  ), corresponding ket vectors (IY(Ω l  ) ) and a decoder mode matrix (Ψ 0χL ) are formed ( 18 ). On the decoder mode matrix a Singular Value Decomposition ( 19 ) is carried out, providing a final decoder mode matrix rank ( r  fin d ). From the final encoder and decoder mode matrix ranks a final mode matrix rank is determined, and from this final mode matrix rank and the encoder side Singular Value Decomposition an adjoint pseudo inverse (Ξ + ) †  of the encoder mode matrix (Ξ 0χs ) and an Ambisonics ket vector (Ia′ s   ) are calculated. The number of components of the Ambisonics ket vector is reduced ( 16 ) according to the final mode matrix rank so as to provide an adapted Ambisonics ket vector (|a′ l    ). From the adapted Ambisonics ket vector, the output values of the decoder side Singular Value Decomposition and the final mode matrix rank an adjoint decoder mode matrix (Ψ) †  is calculated ( 15 ), resulting in a ket vector (|y(Ω l  ) ) of output signals for all loudspeakers.

This application claims the benefit, under 35 U.S.C. §365 ofInternational Application PCT/EP2014/074903, filed Nov. 18, 2014, whichwas published in accordance with PCT Article 21(2) on Jun. 4, 2015 inEnglish and which claims the benefit of European patent application No.13306629.0, filed Nov. 28, 2013.

TECHNICAL FIELD

The invention relates to a method and to an apparatus for Higher OrderAmbisonics encoding and decoding using Singular Value Decomposition.

BACKGROUND

Higher Order Ambisonics (HOA) represents three-dimensional sound. Othertechniques are wave field synthesis (WFS) or channel based approacheslike 22.2. In contrast to channel based methods, however, the HOArepresentation offers the advantage of being independent of a specificloudspeaker set-up. But this flexibility is at the expense of a decodingprocess which is required for the playback of the HOA representation ona particular loudspeaker set-up. Compared to the WFS approach, where thenumber of required loudspeakers is usually very large, HOA may also berendered to set-ups consisting of only few loudspeakers. A furtheradvantage of HOA is that the same representation can also be employedwithout any modification for binaural rendering to headphones.

HOA is based on the representation of the spatial density of complexharmonic plane wave amplitudes by a truncated Spherical Harmonics (SH)expansion. Each expansion coefficient is a function of angularfrequency, which can be equivalently represented by a time domainfunction. Hence, without loss of generality, the complete HOA soundfield representation actually can be assumed to consist of O time domainfunctions, where O denotes the number of expansion coefficients.

These time domain functions will be equivalently referred to as HOAcoefficient sequences or as HOA channels in the following. An HOArepresentation can be expressed as a temporal sequence of HOA dataframes containing HOA coefficients. The spatial resolution of the HOArepresentation improves with a growing maximum order N of the expansion.For the 3D case, the number of expansion coefficients O growsquadratically with the order N, in particular O=(N+1)².

Complex Vector Space

Ambisonics have to deal with complex functions. Therefore a notation isintroduced which is based on complex vector spaces. It operates withabstract complex vectors, which do not represent real geometricalvectors known from the three-dimensional ‘xyz’ coordinate system.Instead, each complex vector describes a possible state of a physicalsystem and is formed by column vectors in a d-dimensional space with dcomponents x_(i) and—according to Dirac—these column-oriented vectorsare called ket vectors denoted as |x

. In a d-dimensional space, an arbitrary |x

is formed by its components x_(i) and d orthonormal basis vectors |e_(i)

:

$\begin{matrix}{\left. {\left. {\left. {\left. {\left. {❘x} \right\rangle = \left. x_{1} \middle| e_{1} \right.} \right\rangle + x_{2}} \middle| e_{2} \right\rangle + \ldots + x_{d}} \middle| e_{d} \right\rangle = \left. {\sum\limits_{i = 1}^{d}x_{i}} \middle| e_{i} \right.} \right\rangle.} & (1)\end{matrix}$

Here, that d-dimensional space is not the normal ‘xyz’ 3D space.

The conjugate complex of a ket vector is called bra vector |x

*=

x|. Bra vectors represent a row-based description and form the dualspace of the original ket space, the bra space.

This Dirac notation will be used in the following description for anAmbisonics related audio system.

The inner product can be built from a bra and a ket vector of the samedimension resulting in a complex scalar value. If a random vector |x

is described by its components in an orthonormal vector basis, thespecific component for a specific base, i.e. the projection of |x

onto |e_(i)

, is given by the inner product:x _(i) =

x∥e _(i)

=

x|e _(i)

.  (2)

Only one bar instead of two bars is considered between the bra and theket vector.

For different vectors |x

and |y

in the same basis, the inner product is got by multiplying the bra

x| with the ket of |y

, so that:

$\begin{matrix}{\left\langle x \middle| y \right\rangle = {{\sum\limits_{i = 1}^{d}\left\langle {x_{i}e_{i}} \middle| {\cdot {\sum\limits_{j = 1}^{d}y_{j}}} \middle| e_{j} \right\rangle} = {{\sum\limits_{i,{j = 1}}^{d}{x_{i}^{*}y_{j}\left\langle e_{i} \middle| e_{j} \right\rangle}} = {{\sum\limits_{i,{j = 1}}^{d}{x_{i}^{*}y_{j}}} = {\sum\limits_{i,{j = 1}}^{d}{y_{j}^{*}{x_{i}.}}}}}}} & (3)\end{matrix}$

If a ket of dimension m×1 and a bra vector of dimension 1×n aremultiplied by an outer product, a matrix A with m rows and n columns isderived:A=|x

y|.  (4)

Ambisonics Matrices

An Ambisonics-based description considers the dependencies required formapping a complete sound field into time-variant matrices. In HigherOrder Ambisonics (HOA) encoding or decoding matrices, the number of rows(columns) is related to specific directions from the sound source or thesound sink. At encoder side, a variant number of S sound sources areconsidered, where s=1, . . . , S. Each sound source s can have anindividual distance r_(s) from the origin, an individual directionΩ_(s)=(Θ_(s),Φ_(s)), where Θ_(s) describes the inclination anglestarting from the z-axis and Θ_(s) describes the azimuth angle startingfrom the x-axis. The corresponding time dependent signal x_(s)=(t) hasindividual time behaviour.

For simplicity, only the directional part is considered (the radialdependency would be described by Bessel functions). Then a specificdirection Ω_(s) is described by the column vector |Y_(n) ^(m)(Ω_(s))

, where n represents the Ambisonics degree and m is the index of theAmbisonics order N. The corresponding values are running from m=1, . . ., N and n=m, . . . , 0, . . . , m, respectively.

In general, the specific HOA description restricts the number ofcomponents O for each ket vector |Y_(n) ^(m)(Ω_(s))

in the 2D or 3D case depending on N:

$\begin{matrix}{O = \left\{ {\begin{matrix}{{{2N} + 1},} & {2D} \\{\left( {N + 1} \right)^{2},} & {3D}\end{matrix}.} \right.} & (5)\end{matrix}$

For more than one sound source, all directions are included if sindividual vectors |Y_(n) ^(m)(Ω_(s))

of order n are combined. This leads to a mode matrix Ξ, containing O×Smode components, i.e. each column of Ξ represents a specific direction:

$\begin{matrix}{\Xi = {\begin{bmatrix}{Y_{0}^{0}\left( \Omega_{1} \right)} & \ldots & {Y_{0}^{0}\left( \Omega_{S} \right)} \\{Y_{1}^{- 1}\left( \Omega_{1} \right)} & \ldots & {Y_{1}^{- 1}\left( \Omega_{S} \right)} \\\vdots & \ddots & \vdots \\{Y_{N}^{N}\left( \Omega_{1} \right)} & \ldots & {Y_{N}^{N}\left( \Omega_{S} \right)}\end{bmatrix}.}} & (6)\end{matrix}$

All signal values are combined in the signal vector |x(kT)

, which considers the time dependencies of each individual source signalx_(s)(kT), but sampled with a common sample rate of:

$\frac{1}{T}$

$\begin{matrix}{\left. {x({kT})} \right\rangle = {\begin{bmatrix}{x_{1}({kT})} \\{x_{2}({kT})} \\\vdots \\{x_{S}({kT})}\end{bmatrix}.}} & (7)\end{matrix}$

In the following, for simplicity, in time-variant signals like |x(kT)

the sample number k is no longer described, i.e. it will be neglected.Then |x

is multiplied with the mode matrix Ξ as shown in equation (8). Thisensures that all signal components are linearly combined with thecorresponding column of the same direction Ω_(s), leading to a ketvector |a_(s)

with O Ambisonics mode components or coefficients according to equation(5):|a _(s)

=Ξ|x

.  (8)

The decoder has the task to reproduce the sound field |a_(l)

represented by a dedicated number of l loudspeaker signals |y

. Accordingly, the loudspeaker mode matrix Ψ consists of L separatedcolumns of spherical harmonics based unit vectors |Y_(n) ^(m)(Ω_(l))

(similar to equation (6)), i.e. one ket for each loudspeaker directionΩ_(l) : |a _(l)

=Ψ|y

.  (9)For quadratic matrices, where the number of modes is equal to the numberof loudspeakers, |y

can be determined by the the inverted mode matrix Ψ. In the general caseof an arbitrary matrix, where the number of rows and columns can bedifferent, the loudspeaker signals |y

can be determined by a pseudo inverse, cf. M. A. Poletti, “A SphericalHarmonic Approach to 3D Surround Sound Systems”, Forum Acusticum,Budapest, 2005. Then, with the pseudo inverse Ψ⁺ of Ψ:|y

=Ψ ^(+|) a _(l)

.  (10)

It is assumed that sound fields described at encoder and at decoder sideare nearly the same, i.e. |a_(s)

≈|a

. However, the loudspeaker positions can be different from the sourcepositions, i.e. for a finite Ambisonics order the real-valued sourcesignals described by |x

and the loudspeaker signals, described by |y

are different. Therefore a panning matrix G can be used which maps |x

on |y

. Then, from equations (8) and (10), the chain operation of encoder anddecoder is:|y

=GΨ ⁺ Ξ|x

.  (11)

Linear Functional

In order to keep the following equations simpler, the panning matrixwill be neglected until section “Summary of invention”. If the number ofrequired basis vectors becomes infinite, one can change from a discreteto a continuous basis. Therefore, a function ƒ can be interpreted as avector having an infinite number of mode components. This is called a‘functional’ in a mathematical sense, because it performs a mapping fromket vectors onto specific output ket vectors in a deterministic way. Itcan be described by an inner product between the function ƒ and the ket|x

, which results in a complex number c in general:

$\begin{matrix}{{\left\langle f \right.\left( \left. x \right\rangle \right)} = {{\sum\limits_{i = 1}^{N}{f_{i} \cdot x_{i}}} = {c.}}} & (12)\end{matrix}$

If the functional preserves the linear combination of the ket vectors, ƒis called ‘linear functional’.

As long as there is a restriction to Hermitean operators, the followingcharacteristics should be considered. Hermitean operators always have:

-   -   real Eigenvalues.    -   a complete set of orthogonal Eigen functions for different        Eigenvalues.

Therefore, every function can be build up from these Eigen functions,cf. H. Vogel, C. Gerthsen, H. O. Kneser, “Physik”, Springer Verlag,1982. An arbitrary function can be represented as linear combination ofspherical harmonics Y_(n) ^(m)(Θ,Φ) with complex constants C_(n) ^(m):

$\begin{matrix}{{f\left( {\theta,\phi} \right)} = {\sum\limits_{n = 0}^{\infty}{\sum\limits_{m = {- N}}^{N}{C_{n}^{m} \cdot {Y_{n}^{m}\left( {\theta,\phi} \right)}}}}} & (13) \\{\left\langle {{f\left( {\theta,\phi} \right)}❘{Y_{n^{\prime}}^{m^{\prime}}\left( {\theta,\phi} \right)}} \right\rangle = {\int_{0}^{2\;\pi}{\int_{0}^{\pi}{{f\left( {\theta,\phi} \right)}^{*}{Y_{n^{\prime}}^{m^{\prime}}\left( {\theta,\phi} \right)}\;\sin\;\theta\; d\;\theta\; d\;{\phi.}}}}} & (14)\end{matrix}$

The indices n,m are used in a deterministic way. They are substituted bya one-dimensional index j, and indices n′,m′ are substituted by an indexi of the same size. Due to the fact that each subspace is orthogonal toa subspace with different i,j, they can be described as linearlyindependent, orthonormal unit vectors in an infinite-dimensional space:

$\begin{matrix}{\left\langle {{f\left( {\theta,\phi} \right)}❘{Y_{i}\left( {\theta,\phi} \right)}} \right\rangle = {\int_{0}^{2\;\pi}{\int_{0}^{\pi}{\left( {\sum\limits_{j = 0}^{\infty}{C_{j}{Y_{j}\left( {\theta,\phi} \right)}}} \right)^{*}{Y_{i}\left( {\theta,\phi} \right)}\;\sin\;\theta\; d\;\theta\; d\;{\phi.}}}}} & (15)\end{matrix}$

The constant values of C_(j) can be set in front of the integral:

$\begin{matrix}{\left\langle {{f\left( {\theta,\phi} \right)}❘{Y_{i}\left( {\theta,\phi} \right)}} \right\rangle = {\sum\limits_{j = 0}^{\infty}{C_{j}^{*}{\int_{0}^{2\;\pi}{\int_{0}^{\pi}{{Y_{j}^{*}\left( {\theta,\phi} \right)}{Y_{i}\left( {\theta,\phi} \right)}\sin\;\theta\; d\;\theta\; d\;{\phi.}}}}}}} & (16)\end{matrix}$

A mapping from one subspace (index j) into another subspace (index i)requires just an integration of the harmonics for the same indices i=jas long as the Eigenfunctions Y_(j) and Y_(i) are mutually orthogonal:

$\begin{matrix}{\left\langle {{f\left( {\theta,\phi} \right)}❘{Y_{i}\left( {\theta,\phi} \right)}} \right\rangle = {\sum\limits_{j = 0}^{\infty}{C_{j}^{*}{\left\langle {{Y_{j}\left( {\theta,\phi} \right)}❘{Y_{i}\left( {\theta,\phi} \right)}} \right\rangle.}}}} & (17)\end{matrix}$

An essential aspect is that if there is a change from a continuousdescription to a bra/ket notation, the integral solution can besubstituted by the sum of inner products between bra and ketdescriptions of the spherical harmonics. In general, the inner productwith a continuous basis can be used to map a discrete representation ofa ket based wave description |x

into a continuous representation. For example, x(ra) is the ketrepresentation in the position basis (i.e. the radius)ra: x(ra)=

ra|x

.  (18)

Looking onto the different kinds of mode matrices Ψ and Ξ, the SingularValue Decomposition is used to handle arbitrary kind of matrices.

Singular Value Decomposition

A singular value decomposition (SVD, cf. G. H. Golub, Ch. F. van Loan,“Matrix Computations”, The Johns Hopkins University Press, 3rd edition,11. Oct. 1996) enables the decomposition of an arbitrary matrix A with mrows and n columns into three matrices U, Σ, and V^(†), see equation(19). In the original form, the matrices U and V^(†) are unitarymatrices of the dimension m×m and n×n, respectively. Such matrices areorthonormal and are build up from orthogonal columns representingcomplex unit vectors |u_(i)

and |v_(i)

^(†)=

v_(i)|, respectively. Unitary matrices from the complex space areequivalent with orthogonal matrices in real space, i.e. their columnspresent an orthonormal vector basis:A=UΣV ^(†).  (19)

The matrices U and V contain orthonormal bases for all four subspaces.

-   -   first r columns of U: column space of A    -   last m−r columns of U: nullspace of A^(†)    -   first r columns of V: row space of A    -   last n−r columns of V: nullspace of A

The matrix Σ contains all singular values which can be used tocharacterize the behaviour of A. In general, Σ is a m by n rectangulardiagonal matrix, with up to r diagonal elements σ_(i), where the rank rgives the number of linear independent columns and rows ofA(r≦min(m,n)). It contains the singular values in descent order, i.e. inequations (20) and (21) σ₁ has the highest and σ_(r) the lowest value.

In a compact form only r singular values, i.e., r columns of U and rrows of V^(†), are required for reconstructing the matrix A. Thedimensions of the matrices U, Σ, and V^(†) differ from the originalform. However, the Σ matrices get always a quadratic form. Then, form>n=r

$\begin{matrix}{{\begin{bmatrix}\text{****} \\\text{***} \\\text{***} \\A \\\text{***} \\\text{***} \\\text{***}\end{bmatrix}_{m \times n} = {\begin{bmatrix}\text{****} \\\text{***} \\\text{***} \\U \\\text{***} \\\text{***} \\\text{***}\end{bmatrix}_{m \times n} \cdot {\begin{bmatrix}\sigma_{1} & 0 & \ldots & \ldots \\0 & \sigma_{2} & 0 & \ldots \\0 & 0 & \ldots & 0 \\\ldots & \ldots & 0 & \sigma_{r}\end{bmatrix}_{n \times n}\begin{bmatrix}\text{****} \\\text{***} \\V^{\dagger} \\\text{***}\end{bmatrix}}_{n \times n}}},} & (20)\end{matrix}$and for n>m=r

$\begin{matrix}{\begin{bmatrix}\text{*******} \\\text{******} \\A \\\text{******}\end{bmatrix}_{m \times n} = {\begin{bmatrix}\text{****} \\\text{***} \\U \\\text{***}\end{bmatrix}_{m \times m} \cdot {{\begin{bmatrix}\sigma_{1} & 0 & \ldots & \ldots \\0 & \sigma_{2} & 0 & \ldots \\0 & 0 & \ldots & 0 \\\ldots & \ldots & 0 & \sigma_{r}\end{bmatrix}_{m \times m}\begin{bmatrix}\text{*******} \\\text{******} \\V^{\dagger} \\\text{******}\end{bmatrix}}_{m \times n}.}}} & (21)\end{matrix}$

Thus the SVD can be implemented very efficiently by a lowrankapproximation, see the above-mentioned Golub/van Loan textbook. Thisapproximation describes exactly the original matrix but contains up to rrank-1 matrices. With the Dirac notation the matrix A can be representedby r rank-1 outer products:A=Σ _(i=1) ^(r)σ_(i) |u _(i)

v _(i)|.  (22)

When looking at the encoder decoder chain in equation (11), there arenot only mode matrices for the encoder like matrix Ξ but also inversesof mode matrices like matrix Ψ or another sophisticated decoder matrixare to be considered. For a general matrix A, the pseudo inverse A⁺ of Acan be directly examined from the SVD by performing the inversion of thesquare matrix Σ and the conjugate complex transpose of U and V^(†),which results to:A ⁺ =VΣ ⁻¹ U ^(†).  (23)

For the vector based description of equation (22), the pseudo inverse A⁺is got by performing the conjugate transpose of |u_(i)

and

v_(i)|, whereas the singular values σ_(i) have to be inverted. Theresulting pseudo inverse looks as follows:

$\begin{matrix}{A^{+} = {\sum\limits_{i = 1}^{r}{\left( \frac{1}{\sigma_{i}} \right)\left. v_{i} \right\rangle{\left\langle u_{i} \right..}}}} & (24)\end{matrix}$

If the SVD based decomposition of the different matrices is combinedwith a vector based description (cf. equations (8) and (10)) one getsfor the encoding process:

$\begin{matrix}{{\left. a_{s} \right\rangle = {{\sum\limits_{s_{i} = 1}^{r_{s}}{\sigma_{s_{i}}\left. u_{s_{i}} \right\rangle{\left\langle v_{s_{i}} \right. \cdot \left. x \right\rangle}}} = {\sum\limits_{s_{i} = 1}^{r_{s}}{\sigma_{s_{i}}\left. u_{s_{i}} \right\rangle\left\langle {v_{s_{i}}❘x} \right\rangle}}}},} & (25)\end{matrix}$

and for the decoder when considering the pseudo inverse matrix Ψ⁺(equation (24)):

$\begin{matrix}{\left. y \right\rangle = {\left( {\sum\limits_{l_{i} = 1}^{r_{l}}{\left( \frac{1}{\sigma_{l_{i}}} \right)\left. v_{l_{i}} \right\rangle\left\langle u_{l_{i}} \right.}} \right){\left. a_{l} \right\rangle.}}} & (26)\end{matrix}$

If it is assumed that the Ambisonics sound field description |a_(s)

from the encoder is nearly the same as |a_(l)

) for the decoder, and the dimensions r_(s)=r_(l)=r, than with respectto the input signal |x

and the output signal |y

a combined equation looks as follows:

$\begin{matrix}{\left. y \right\rangle = {\left( {\sum\limits_{l_{i} = 1}^{r}{\left( \frac{1}{\sigma_{l_{i}}} \right)\left. v_{l_{i}} \right\rangle\left\langle u_{l_{i}} \right.}} \right){\sum\limits_{s_{i} = 1}^{r}{\sigma_{s_{i}}\left. u_{s_{i}} \right\rangle{\left\langle {v_{s_{i}}❘x} \right\rangle.}}}}} & (27)\end{matrix}$

SUMMARY OF INVENTION

However, this combined description of the encoder decoder chain has somespecific problems which are described in the following.

Influence on Ambisonics Matrices

Higher Order Ambisonics (HOA) mode matrices Ξ and Ψ are directlyinfluenced by the position of the sound sources or the loudspeakers (seeequation (6)) and their Ambisonics order. If the geometry is regular,i.e. the mutually angular distances between source or loudspeakerpositions are nearly equal, equation (27) can be solved.

But in real applications this is often not true. Thus it makes sense toperform an SVD of Ξ and Ψ, and to investigate their singular values inthe corresponding matrix Σ because it reflects the numerical behaviourof Ξ and Ψ. Σ is a positive definite matrix with real singular values.But nevertheless, even if there are up to r singular values, thenumerical relationship between these values is very important for thereproduction of sound fields, because one has to build the inverse orpseudo inverse of matrices at decoder side. A suitable quantity formeasuring this behaviour is the condition number of A. The conditionnumber κ(A) is defined as ratio of the smallest and the largest singularvalue:

$\begin{matrix}{{\kappa(A)} = {\frac{\sigma_{r}}{\sigma_{1}}.}} & (28)\end{matrix}$

Inverse Problems

Ill-conditioned matrices are problematic because they have a Large κ(A).In case of an inversion or pseudo inversion, an ill-conditioned matrixleads to the problem that small singular values σ_(i) become verydominant. In P.Ch. Hansen, “RankDeficient and Discrete Ill-PosedProblems: Numerical Aspects of Linear Inversion”, Society for Industrialand Applied Mathematics (SIAM), 1998, two fundamental types of problemsare distinguished (chapter 1.1, pages 2-3) by describing how singularvalues are decaying:

-   -   Rank-deficient problems, where the matrices have a gap between a        cluster of large and small singular values (nongradually decay);    -   Discrete ill-posed problems, where in average all singular        values of the matrices decay gradually to zero, i.e. without a        gap in the singular values spectrum.

Concerning the geometry of microphones at encoder side as well as forthe loudspeaker geometry at decoder side, mainly the firstrank-deficient problem will occur. However, it is easier to modify thepositions of some microphones during the recording than to control allpossible loudspeaker positions at customer side. Especially at decoderside an inversion or pseudo inversion of the mode matrix is to beperformed, which leads to numerical problems and overemphasised valuesfor the higher mode components (see the above-mentioned Hansen book).

Signal Related Dependency

Reducing that inversion problem can be achieved for example by reducingthe rank of the mode matrix, i.e. by avoiding the smallest singularvalues. But then a threshold is to be used for the smallest possiblevalue σ_(r) (cf. equations (20) and (21)). An optimal value for suchlowest singular value is described in the above-mentioned Hansen book.Hansen proposes

${\sigma_{opt} = \frac{1}{\sqrt{S\; N\; R}}},$which depends on the characteristic of the input signal (here describedby |x

). From equation (27) it can be see, that this signal has an influenceon the reproduction, but the signal dependency cannot be controlled inthe decoder.

Problems with Non-Orthonormal Basis

The state vector |a_(s)

, transmitted between the HOA encoder and the HOA decoder, is describedin each system in a different basis according to equations (25) and(26). However, the state does not change if an orthonormal basis isused. Then the mode components can be projected from one to anotherbasis. So, in principle, each loudspeaker setup or sound descriptionshould build on an orthonormal basis system because this allows thechange of vector representations between these bases, e.g. in Ambisonicsa projection from 3D space into the 2D subspace.

However, there are often setups with ill-conditioned matrices where thebasis vectors are nearly linear dependent. So, in principle, anon-orthonormal basis is to be dealt with. This complicates the changefrom one subspace to another subspace, which is necessary if the HOAsound field description shall be adopted onto different loudspeakersetups, or if it is desired to handle different HOA orders anddimensions at encoder or decoder sides.

A typical problem for the projection onto a sparse loudspeaker set isthat the sound energy is high in the vicinity of a loudspeaker and islow if the distance between these loudspeakers is large. So the locationbetween different loudspeakers requires a panning function that balancesthe energy accordingly.

The problems described above can be circumvented by the inventiveprocessing, and are solved by the method disclosed in claim 1. Anapparatus that utilises this method is disclosed in claim 2.

According to the invention, a reciprocal basis for the encoding processin combination with an original basis for the decoding process are usedwith consideration of the lowest mode matrix rank, as well as truncatedsingular value decomposition. Because a bi-orthonormal system isrepresented, it is ensured that the product of encoder and decodermatrices preserves an identity matrix at least for the lowest modematrix rank.

This is achieved by changing the ket based description to arepresentation based in the dual space, the bra space with reciprocalbasis vectors, where every vector is the adjoint of a ket. It isrealised by using the adjoint of the pseudo inverse of the modematrices. ‘Adjoint’ means complex conjugate transpose.

Thus, the adjoint of the pseudo inversion is used already at encoderside as well as the adjoint decoder matrix. For the processingorthonormal reciprocal basis vectors are used in order to be invariantfor basis changes. Furthermore, this kind of processing allows toconsider input signal dependent influences, leading to noise reductionoptimal thresholds for the σ_(i) in the regularisation process.

In principle, the inventive method is suited for Higher Order Ambisonicsencoding and decoding using Singular Value Decomposition, said methodincluding the steps:

-   -   receiving an audio input signal;    -   based on direction values of sound sources and the Ambisonics        order of said audio input signal, forming corresponding ket        vectors of spherical harmonics and a corresponding encoder mode        matrix;    -   carrying out on said encoder mode matrix a Singular Value        Decomposition, wherein two corresponding encoder unitary        matrices and a corresponding encoder diagonal matrix containing        singular values and a related encoder mode matrix rank are        output;    -   determining from said audio input signal, said singular values        and said encoder mode matrix rank a threshold value;    -   comparing at least one of said singular values with said        threshold value and determining a corresponding final encoder        mode matrix rank;    -   based on direction values of loudspeakers and a decoder        Ambisonics order, forming corresponding ket vectors of spherical        harmonics for specific loudspeakers located at directions        corresponding to said direction values and a corresponding        decoder mode matrix;    -   carrying out on said decoder mode matrix a Singular Value        Decomposition, wherein two corresponding decoder unitary        matrices and a corresponding decoder diagonal matrix containing        singular values are output and a corresponding final rank of        said decoder mode matrix is determined;    -   determining from said final encoder mode matrix rank and said        final decoder mode matrix rank a final mode matrix rank;    -   calculating from said encoder unitary matrices, said encoder        diagonal matrix and said final mode matrix rank an adjoint        pseudo inverse of said encoder mode matrix, resulting in an        Ambisonics ket vector,

and reducing the number of components of said Ambisonics ket vectoraccording to said final mode matrix rank, so as to provide an adaptedAmbisonics ket vector;

-   -   calculating from said adapted Ambisonics ket vector, said        decoder unitary matrices, said decoder diagonal matrix and said        final mode matrix rank an adjoint decoder mode matrix resulting        in a ket vector of output signals for all loudspeakers.

In principle the inventive apparatus is suited for Higher OrderAmbisonics encoding and decoding using Singular Value Decomposition,said apparatus including means being adapted for:

-   -   receiving an audio input signal;    -   based on direction values of sound sources and the Ambisonics        order of said audio input signal, forming corresponding ket        vectors of spherical harmonics and a corresponding encoder mode        matrix;    -   carrying out on said encoder mode matrix a Singular Value        Decomposition, wherein two corresponding encoder unitary        matrices and a corresponding encoder diagonal matrix containing        singular values and a related encoder mode matrix rank are        output;    -   determining from said audio input signal, said singular values        and said encoder mode matrix rank a threshold value;    -   comparing at least one of said singular values with said        threshold value and determining a corresponding final encoder        mode matrix rank;    -   based on direction values of loudspeakers and a decoder        Ambisonics order, forming corresponding ket vectors of spherical        harmonics for specific loudspeakers located at directions        corresponding to said direction values and a corresponding        decoder mode matrix;    -   carrying out on said decoder mode matrix a Singular Value        Decomposition, wherein two corresponding decoder unitary        matrices and a corresponding decoder diagonal matrix containing        singular values are output and a corresponding final rank of        said decoder mode matrix is determined;    -   determining from said final encoder mode matrix rank and said        final decoder mode matrix rank a final mode matrix rank;    -   calculating from said encoder unitary matrices, said encoder        diagonal matrix and said final mode matrix rank an adjoint        pseudo inverse of said encoder mode matrix, resulting in an        Ambisonics ket vector,        and reducing the number of components of said Ambisonics ket        vector according to said final mode matrix rank, so as to        provide an adapted Ambisonics ket vector;    -   calculating from said adapted Ambisonics ket vector, said        decoder unitary matrices, said decoder diagonal matrix and said        final mode matrix rank an adjoint decoder mode matrix resulting        in a ket vector of output signals for all loudspeakers.

Advantageous additional embodiments of the invention are disclosed inthe respective dependent claims.

BRIEF DESCRIPTION OF DRAWINGS

Exemplary embodiments of the invention are described with reference tothe accompanying drawings, which show in:

FIG. 1 Block diagram of HOA encoder and decoder based on SVD;

FIG. 2 Block diagram of HOA encoder and decoder including linearfunctional panning;

FIG. 3 Block diagram of HOA encoder and decoder including matrixpanning;

FIG. 4 Flow diagram for determining threshold value to σ_(ε);

FIG. 5 Recalculation of singular values in case of a reduced mode matrixrank r_(fin) _(e) , and computation of |a′_(s)

;

FIG. 6 Recalculation of singular values in case of reduced mode matrixranks r_(fin) _(e) and r_(fin) _(d) , and computation of loudspeakersignals |y(Ω_(l))

with or without panning.

DESCRIPTION OF EMBODIMENTS

A block diagram for the inventive HOA processing based on SVD isdepicted in FIG. 1 with the encoder part and the decoder part. Bothparts are using the SVD in order to generate the reciprocal basisvectors. There are changes with respect to known mode matchingsolutions, e.g. the change related to equation (27).

HOA Encoder

To work with reciprocal basis vectors, the ket based description ischanged to the bra space, where every vector is the Hermitean conjugateor adjoint of a ket. It is realised by using the pseudo inversion of themode matrices.

Then, according to equation (8), the (dual) bra based Ambisonics vectorcan also be reformulated with the (dual) mode matrixΞ_(d) :

a _(s) |=

x|Ξ _(d) =

x|Ξ ⁺.  (29)

The resulting Ambisonics vector at encoder side

a_(s)| is now in the bra semantic. However, a unified description isdesired, i.e. return to the ket semantic. Instead of the pseudo inverseof Ξ, the Hermitean conjugate of Ξ_(d) ^(†) or Ξ⁺ ^(†) is used:|a _(s)

=Ξ_(d) ^(†) |x

=Ξ ⁺ ^(†) |x

.  (30)

According to equation (24)

$\begin{matrix}{{\Xi^{+ \dagger} = {\left( {\sum\limits_{i = 1}^{r_{s}}{\left( \frac{1}{\sigma_{s_{i}}} \right)\left. v_{s_{i}} \right\rangle\left\langle u_{s_{i}} \right.}} \right)^{\dagger} = {\sum\limits_{i = 1}^{r_{s}}{\left( \frac{1}{\sigma_{s_{i}}} \right)\left. u_{s_{i}} \right\rangle\left\langle v_{s_{i}} \right.}}}},} & (31)\end{matrix}$

where all singular values are real and the complex conjugation of σ_(s)_(i) can be neglected.

This leads to the following description of the Ambisonics components:

$\begin{matrix}{\left. a_{s} \right\rangle = {\sum\limits_{i = 1}^{r_{s}}{\left( \frac{1}{\sigma_{s_{i}}} \right)\left. u_{s_{i}} \right\rangle{\left\langle {v_{s_{i}}❘x} \right\rangle.}}}} & (32)\end{matrix}$

The vector based description for the source side reveals that |a_(s)

depends on the inverse σ_(s) _(i) . If this is done for the encoderside, it is to be changed to corresponding dual basis vectors at decoderside.

HOA Decoder

In case the decoder is originally based on the pseudo inverse, one getsfor deriving the loudspeaker signals 10:|a _(l)

=Ψ⁺ ^(†) |y

,  (33)i.e. the loudspeaker signals are:|y

=(Ψ⁺ ^(†) )⁺ ·|a _(l)

=Ψ^(†) ·|a _(l)

.  (34)

Considering equation (22), the decoder equation results in:|y

=(Σ_(i=1) ^(r)σ_(l) _(i) |u _(l) _(i)

v _(l) _(i) |)^(†) |a _(l)

.  (35)

Therefore, instead of building a pseudo inverse, only an adjointoperation (denoted by ‘†’) is remaining in equation (35). This meansthat less arithmetical operations are required in the decoder, becauseone only has to switch the sign of the imaginary parts and thetransposition is only a matter of modified memory access:

$\begin{matrix}{\left. y \right\rangle = {\left( {\sum\limits_{i = 1}^{r}{{\sigma_{l_{i}} \cdot \left. v_{l_{i}} \right\rangle}\left\langle u_{l_{i}} \right.}} \right){\left. a_{l} \right\rangle.}}} & (36)\end{matrix}$

If it is assumed that the Ambisonics representations of the encoder andthe decoder are nearly the same, i.e. |a_(s)

=|a_(l)

, with equation (32) the complete encoder decoder chain gets thefollowing dependency:

$\begin{matrix}{{\left. y \right\rangle = {\sum\limits_{i = 1}^{r}{{\left( \frac{\sigma_{l_{i}}}{\sigma_{s_{i}}} \right) \cdot \left. v_{l_{i}} \right\rangle}\left\langle {u_{l_{i}}❘u_{s_{i}}} \right\rangle\left\langle {v_{s_{i}}❘x} \right\rangle}}},} & (37) \\{\left. y \right\rangle = {\sum\limits_{i = 1}^{r}{\left( \frac{\sigma_{l_{i}}}{\sigma_{s_{i}}} \right){\left\langle {u_{l_{i}}❘u_{s_{i}}} \right\rangle \cdot \left. v_{l_{i}} \right\rangle}{\left\langle {v_{s_{i}}❘x} \right\rangle.}}}} & (38)\end{matrix}$

In a real scenario the panning matrix G from equation (11) and a finiteAmbisonics order are to be considered. The latter leads to a limitednumber of linear combinations of basis vectors which are used fordescribing the sound field. Furthermore, the linear independence ofbasis vectors is influenced by additional error sources, like numericalrounding errors or measurement errors. From a practical point of view,this can be circumvented by a numerical rank (see the above-mentionedHansen book, chapter 3.1), which ensures that all basis vectors arelinearly independent within certain tolerances.

To be more robust against noise, the SNR of input signals is considered,which affects the encoder ket and the calculated Ambisonicsrepresentation of the input. So, if necessary, i.e. for ill-conditionedmode matrices that are to be inverted, the σ_(i) value is regularisedaccording to the SNR of the input signal in the encoder.

Regularisation in the Encoder

Regularisation can be performed by different ways, e.g. by using athreshold via the truncated SVD. The SVD provides the σ_(i) in adescending order, where the σ_(i) with lowest level or highest index(denoted σ_(r)) contains the components that switch very frequently andlead to noise effects and SNR (cf. equations (20) and (21) and theabove-mentioned Hansen textbook). Thus a truncation SVD (TSVD) comparesall σ_(i) values with a threshold value and neglects the noisycomponents which are beyond that threshold value σ_(ε). The thresholdvalue σ_(ε) can be fixed or can be optimally modified according to theSNR of the input signals.

The trace of a matrix means the sum of all diagonal matrix elements.

The TSVD block (10, 20, 30 in FIGS. 1 to 3) has the following tasks:

-   -   computing the mode matrix rank r;    -   removing the noisy components below the threshold value and        setting the final mode matrix rank r_(fin).

The processing deals with complex matrices Ξ and Ψ. However, forregularising the real valued σ_(i), these matrices cannot be useddirectly. A proper value comes from the product between Ξ with itsadjoint Ξ^(†). The resulting matrix is quadratic with real diagonaleigenvalues which are equivalent with the quadratic values of theappropriate singular values. If the sum of all eigenvalues, which can bedescribed by the trace of matrixΣ² trace(Σ²)=Σ_(i=1) ^(r)σ_(i) ²,  (39)stays fixed, the physical properties of the system are conserved. Thisalso applies for matrix Ψ.

Thus block ONB_(s) at the encoder side (15,25,35 in FIG. 1-3) or blockONB_(l) at the decoder side (19,29,39 in FIG. 1-3) modify the singularvalues so that trace(Σ²) before and after regularisation is conserved(cf. FIG. 5 and FIG. 6):

-   -   Modify the rest of σ_(i) (for i=1 . . . r_(fin)) such that the        trace of the original and the aimed truncated matrix Σ_(t) stays        fixed (trace(Σ²)=trace(Σ_(t) ²)).    -   Calculate a constant value Δσ that fulfils        Σ_(i=1) ^(r)σ_(i) ²=Σ_(i=1) ^(rfin)(σ_(i)=Δσ)².  (40)

If the difference between normal and reduced number of singular valuesis called (ΔE=trace(Σ)=trace(Σ)_(r) _(fin) .) the resulting value is asfollows:

$\begin{matrix}{{\Delta\;\sigma} = {{\frac{1}{r_{fin}}\left( {{- {\sum\limits_{i = 1}^{rfin}\sigma_{i}}} + \sqrt{\left\lbrack {\sum\limits_{i = 1}^{rfin}\sigma_{i}} \right\rbrack^{2} + {r_{fin}\Delta\; E}}} \right)} = {\frac{1}{r_{{fin}_{d}}}\left( {{- {{trace}(\Sigma)}_{rfin}} + \sqrt{{{trace}(\Sigma)}_{rfin}^{2} + {r_{{fin}_{d}}\Delta\; E}}} \right)}}} & (41)\end{matrix}$

-   -   Re-calculate all new singular values σ_(i,t) for the truncated        matrix        Σ_(t): σ_(i,t)=σ_(i)+Δσ.  (42)

Additionally, a simplification can be achieved for the encoder and thedecoder if the basis for the appropriate |a

(see equations (30) or (33)) is changed into the correspondingSVD-related {U^(†)} basis, leading to:

$\begin{matrix}{\left. a^{\prime} \right\rangle = {{\sum\limits_{i = 1}^{rfin}{{\left\langle u_{i} \right.\left\lbrack {\sum\limits_{i = 1}^{rfin}{\sigma_{i,t}\left. u_{i} \right\rangle\left\langle v_{i} \right.}} \right\rbrack}\left. a \right\rangle}} = {\sum\limits_{i = 1}^{rfin}{\sigma_{i,t}\left\langle {v_{i}❘a} \right\rangle}}}} & (43)\end{matrix}$

(remark: if σ_(i) and |a

are used without additional encoder or decoder index, they refer toencoder side or/and to decoder side). This basis is orthonormal so thatit preserves the norm of |a

. I.e., instead of |a

the regularisation can use |a′

which requires matrices Σ and V but no longer matrix U.

-   -   Use of the reduced ket |a′        in the {U^(†)} basis, which has the advantage that the rank is        reduced in deed.

Therefore in the invention the SVD is used on both sides, not only forperforming the orthonormal basis and the singular values of theindividual matrices Ξ and Ψ, but also for getting their ranks r_(fin).

Component Adaption

By considering the source rank of Ξ or by neglecting some of thecorresponding σ_(s) with respect to the threshold or the final sourcerank, the number of components can be reduced and a more robust encodingmatrix can be provided. Therefore, an adaption of the number oftransmitted Ambisonics components according to the corresponding numberof components at decoder side is performed. Normally, it depends onAmbisonics order O. Here, the final mode matrix rank r_(fin) _(e) gotfrom the SVD block for the encoder matrix Ξ and the final mode matrixrank r_(fin) _(d) got from the SVD block for the decoder matrix Ψ are tobe considered. In Adapt#Comp step/stage 16 the number of components isadapted as follows:

-   -   r_(fin) _(e) =r_(fin) _(d) : nothing changed—no compression;    -   r_(fin) _(e) <r_(fin) _(d) : compression, neglect r_(fin) _(e)        −r_(fin) _(d) columns in the decoder matrix Ψ^(†)=> encoder and        decoder operations reduced;    -   r_(fin) _(e) >r_(fin) _(d) : cancel r_(fin) _(e) >r_(fin) _(d)        components of the Ambisonics state vector before transmission,        i.e. compression. Neglect r_(fin) _(e) −r_(fin) _(d) rows in the        encoder matrix Ξ=> encoder and decoder operations reduced.

The result is that the final mode matrix rank r_(fin) to be used atencoder side and at decoder side is the smaller one of r_(fin) _(d) andr_(fin) _(e) .

Thus, if a bidirectional signal between encoder and decoder exists forinterchanging the rank of the other side, one can use the rankdifferences to improve a possible compression and to reduce the numberof operations in the encoder and in the decoder.

Consider Panning Functions

The use of panning functions ƒ_(s),ƒ_(l) or of the panning matrix G wasmentioned earlier, see equation (11), due to the problems concerning theenergy distribution which are got for sparse and irregular-loudspeakersetups. These problems have to deal with the limited order that cannormally be used in Ambisonics (see sections Influence on Ambisonicsmatrices to Problems with non-orthonormal basis).

Regarding the requirements for panning matrix G, following encoding itis assumed that the sound field of some acoustic sources is in a goodstate represented by the Ambisonics state vector |a_(s)

. However, at decoder side it is not known exactly how the state hasbeen prepared. I.e., there is no complete knowledge about the presentstate of the system.

Therefore the reciprocal basis is taken for preserving the inner productbetween equations (9) and (8).

Using the pseudo inverse already at encoder side provides the followingadvantages:

-   -   use of reciprocal basis satisfies bi-orthogonality between        encoder and decoder basis (        x^(i)|x_(j)        =δ_(j) ^(i));    -   smaller number of operations in the encoding/decoding chain;    -   improved numerical aspects concerning SNR behaviour;    -   orthonormal columns in the modified mode matrices instead of        only linearly independent ones;    -   it simplifies the change of the basis;    -   use rank-1 approximation leads to less memory effort and a        reduced number of operations, especially if the final rank is        low. In general, for a M×N matrix, instead of M*N only M+N        operations are required;    -   it simplifies the adaptation at decoder side because the pseudo        inverse in the decoder can be avoided;    -   the inverse problems with numerical unstable σ can be        circumvented.

In FIG. 1, at encoder or sender side, s=1, . . . , S different directionvalues Ω_(s) of sound sources and the Ambisonics order N_(s) are inputto a step or stage 11 which forms therefrom corresponding ket vectors|Y(Ω_(s))

of spherical harmonics and an encoder mode matrix Ξ_(O×S) having thedimension OxS. Matrix Ξ_(O×S) is generated in correspondence to theinput signal vector |x(Ω_(s))

, which comprises S source signals for different directions Ω_(s).Therefore matrix Ξ_(O×S) is a collection of spherical harmonic ketvectors |Y(Ω_(s))

. Because not only the signal x(Ω_(s)), but also the position varieswith time, the calculation matrix Ξ_(O×S) can be performed dynamically.This matrix has a non-orthonormal basis NONB_(S) for sources. From theinput signal |x(Ω_(s))

and a rank value r_(s) a specific singular threshold value σ_(ε), isdetermined in step or stage 12. The encoder mode matrix Ξ_(O×S) andthreshold value σ_(ε) are fed to a truncation singular valuedecomposition TSVD processing (cf. above section Singular valuedecomposition), which performs in step or stage 13 a singular valuedecomposition for mode matrix Ξ_(O×S) in order to get its singularvalues, whereby on one hand the unitary matrices U and V^(†) and thediagonal matrix Σ containing r_(s) singular values σ₁ . . . σ_(r) _(s)are output and on the other hand the related encoder mode matrix rankr_(s) is determined (Remark: σ_(i) is the i-th singular value frommatrix Σ of SVD(Ξ)=UΣV⁺).

In step/stage 12 the threshold value σ_(ε) is determined according tosection Regularisation in the encoder. Threshold value σ_(ε) can limitthe number of used σ_(s) _(i) values to the truncated or final encodermode matrix rank r_(fin) _(e) . Threshold value σ_(ε) can be set to apredefined value, or can be adapted to the signal-to-noise ratio SNR ofthe input signal:

${\sigma_{ɛ,{opt}} = \frac{1}{\sqrt{S\; N\; R}}},$whereby the SNR of all S source signals |x(Ω_(s))

is measured over a predefined number of sample values.

In a comparator step or stage 14 the singular value σ_(r) from matrix Σis compared with the threshold value σ_(ε), and from that comparison thetruncated or final encoder mode matrix rank r_(fin) _(e) is calculatedthat modifies the rest of the σ_(s) _(i) values according to sectionRegularisation in the encoder. The final encoder mode matrix rankr_(fin) _(e) is fed to a step or stage 16.

Regarding the decoder side, from l=1, . . . , L direction values Ω_(l)of loudspeakers and from the decoder Ambisonics order N_(l),corresponding ket vectors |Y(Ω_(l))

of spherical harmonics for specific loudspeakers at directions Ω_(l) aswell as a corresponding decoder mode matrix Ψ_(O×L) having the dimensionOxL are determined in step or stage 18, in correspondence to theloudspeaker positions of the related signals |y(Ω_(l))

in block 17. Similar to the encoder matrix Ξ_(O×S), decoder matrixΨ_(O×L) is a collection of spherical harmonic ket vectors |Y(Ω_(l))

for all directions Ω_(l). The calculation of Ψ_(O×L), is performeddynamically.

In step or stage 19 a singular value decomposition processing is carriedout on decoder mode matrix Ψ_(O×L) and the resulting unitary matrices Uand V^(†) as well as diagonal matrix Σ are fed to block 17. Furthermore,a final decoder mode matrix rank r_(fin) _(d) is calculated and is fedto step/stage 16.

In step or stage 16 the final mode matrix rank r_(fin) is determined, asdescribed above, from final encoder mode matrix rank r_(fin) _(e) andfrom final decoder mode matrix rank r_(fin) _(d) . Final mode matrixrank r_(fin) is fed to step/stage 15 and to step/stage 17.

Encoder-side matrices U_(s), V_(s) ^(†), Σ_(s), rank value r_(s), finalmode matrix rank value r_(fin) and the time dependent input signal ketvector |x(Ω_(s))

of all source signals are fed to a step or stage 15, which calculatesusing equation (32) from these μ_(O×S) related input values the adjointpseudo inverse (Ξ⁺)^(†) of the encoder mode matrix. This matrix has thedimension r_(fin) _(e) ×S and an orthonormal basis for sources ONB_(s).When dealing with complex matrices and their adjoints, the following isconsidered: Ξ_(O×S) ^(†)Ξ_(O×S)=trace(Σ^(Z))=Σ_(i=1) ^(r)σ_(s) _(i) ².Step/stage 15 outputs the corresponding time-dependent Ambisonics ket orstate vector |a′_(s)

, cf. above section HOA encoder.

In step or stage 16 the number of components of |a′_(s)

is reduced using final mode matrix rank r_(fin) as described in abovesection Component adaption, so as to possibly reduce the amount oftransmitted information, resulting in time-dependent Ambisonics ket orstate vector |a′_(l)

after adaption.

From Ambisonics ket or state vector |a′_(l)

, from the decoderside matrices U_(l) ^(†), V_(l), Σ_(l) and the rankvalue r_(l) derived from mode matrix Ψ_(O×L), and from the final modematrix rank value r_(fin) from step/stage 16 an adjoint decoder modematrix (Ψ)^(†) having the dimension L×r_(fin) _(d) and an orthonormalbasis for loudspeakers ONB_(l) is calculated, resulting in a ket vector|y(Ω_(l))

of time-dependent output signals of all loudspeakers, cf. above sectionHOA decoder. The decoding is performed with the conjugate transpose ofthe normal mode matrix, which relies on the specific loudspeakerpositions. For an additional rendering a specific panning matrix shouldbe used.

The decoder is represented by steps/stages 18, 19 and 17. The encoder isrepresented by the other steps/stages.

Steps/stages 11 to 19 of FIG. 1 correspond in principle to steps/stages21 to 29 in FIG. 2 and steps/stages 31 to 39 in FIG. 3, respectively.

In FIG. 2 in addition a panning function ƒ_(s) for the encoder sidecalculated in step or stage 211 and a panning function ƒ_(l) 281 for thedecoder side calculated in step or stage 281 are used for linearfunctional panning. Panning function ƒ_(s) is an additional input signalfor step/stage 21, and panning function ƒ_(l) is an additional inputsignal for step/stage 28. The reason for using such panning functions isdescribed in above section Consider panning functions.

In comparison to FIG. 1, in FIG. 3 a panning matrix G controls a panningprocessing 371 on the preliminary ket vector of time-dependent outputsignals of all loudspeakers at the output of step/stage 37. This resultsin the adapted ket vector |y(Ω_(l))

of time-dependent output signals of all loudspeakers.

FIG. 4 shows in more detail the processing for determining thresholdvalue σ_(ε) based on the singular value decomposition SVD processing 40of encoder mode matrix Ξ_(O×S). That SVD processing delivers matrix Σ(containing in its descending diagonal all singular values σ_(i) runningfrom σ₁ to σ_(r) _(s) , see equations (20) and (21)) and the rank r_(s)of matrix Σ.

In case a fixed threshold is used (block 41), within a loop controlledby variable i (blocks 42 and 43), which loop starts with i=1 and can runup to i=r_(s), it is checked (block 45) whether there is an amount valuegap in between these σ_(i) values. Such gap is assumed to occur if theamount value of a singular value σ_(i+1) is significantly smaller, forexample smaller than 1/10, than the amount value of its predecessorsingular value σ_(i). When such gap is detected, the loop stops and thethreshold value σ_(ε) is set (block 46) to the current singular valueσ_(i). In case i=r_(s) (block 44), the lowest singular valueσ_(i)=σ_(r), is reached, the loop is exit and σ_(ε) is set (block 46) toσ_(r).

In case a fixed threshold is not used (block 41), a block of T samplesfor all S source signals X=[|x(Ω_(s),t=0)

, . . . , |x(Ω_(s),t=T)

] (=matrix S×T) is investigated (block 47). The signal-to-noise ratioSNR for X is calculated (block 48) and the threshold value σ_(ε) is set

$\sigma_{ɛ} = \frac{1}{\sqrt{S\; N\; R}}$(block 49).

FIG. 5 shows within step/stage 15, 25, 35 the recalculation of singularvalues in case of reduced mode matrix rank r_(fin), and the computationof |a′_(s)

. The encoder diagonal matrix Σ_(s) from block 10/20/30 in FIG. 1/2/3 isfed to a step or stage 51 which calculates using value r_(s) the totalenergy trace(Σ²)=Σ_(i=1) ^(r) ^(s) σ_(s) _(i) ², to a step or stage 52which calculates using value

r_(fin_(e))the reduced total energy

${{{trace}\left( \Sigma_{r_{{fin}_{e}}}^{2} \right)} = {\sum\limits_{i = 1}^{r_{{fin}_{e}}}\sigma_{s_{i}}^{2}}},$and to a step or stage 54. The difference ΔE between the total energyvalue and the reduced total energy value, value

trace(Σ_(r_(fin_(e))))and value r_(fin) _(e) are fed to a step or stage 53 which calculates

${\Delta\;\sigma} = {\frac{1}{r_{{fin}_{e}}}{\left( {{{- {trace}}\;\left( \Sigma_{r_{{fin}_{e}}} \right)} + \sqrt{\left\lbrack {{trace}\left( \Sigma_{r_{{fin}_{e}}} \right)} \right\rbrack^{2} + {r_{{fin}_{e}}\Delta\; E}}} \right).}}$

Value Δσ is required in order to ensure that the energy which isdescribed by trace(Σ²)=Σ_(i=1) ^(r)σ_(l) _(i) ² is kept such that theresult makes sense physically. If at encoder or at decoder side theenergy is reduced due to matrix reduction, such loss of energy iscompensated for by value Δσ, which is distributed to all remainingmatrix elements in an equal manner, i.e. Σ_(i=1) ^(r) ^(fin)(σ_(i)+Δσ)²=Σ_(i=1) ^(r)(σ_(i))².

Step or stage 54 calculates

$\sum\limits_{t}^{+}{= {\sum\limits_{i = 1}^{r_{{fin}_{e}}}{\frac{1}{\left( {\sigma_{s_{i}} + {\Delta\;\sigma}} \right)}I}}}$from Σ_(s), Δσ and r_(fin) _(e) .

Input signal vector |x(Ψ_(s))

is multiplied by matrix V_(s) ^(†). The result multiplies Σ_(t) ⁺. Thelatter multiplication result is ket vector |a′_(s)

.

FIG. 6 shows within step/stage 17, 27, 37 the recalculation of singularvalues in case of reduced mode matrix rank r_(fin), and the computationof loudspeaker signals |y(Ω_(l))

, with or without panning. The decoder diagonal matrix Σ_(l) from block19/29/39 in FIG. 1/2/3 is fed to a step or stage 61 which calculatesusing value r_(l) the total energy trace(Σ²)=Σ_(i=1) ^(r) ^(l) σ_(s)_(i) ², to a step or stage 62 which calculates using value r_(fin) _(d)the reduced total energy

${{{trace}\left( \Sigma_{r_{{fin}_{d}}}^{2} \right)} = {\sum\limits_{i = 1}^{r_{{fin}_{d}}}\sigma_{s_{i}}^{2}}},$and to a step or stage 64. The difference ΔE between the total energyvalue and the reduced total energy value, value

trace(Σ_(r_(fin_(d))))and value r_(fin) _(d) are fed to a step or stage 63 which calculates

${\Delta\;\sigma} = {\frac{1}{r_{{fin}_{d}}}{\left( {{{- {trace}}\;\left( \Sigma_{r_{{fin}_{d}}} \right)} + \sqrt{\left( {{trace}\left( \Sigma_{r_{{fin}_{d}}} \right)} \right)^{2} + {r_{{fin}_{d}}\Delta\; E}}} \right).}}$

Step or stage 64 calculates

$\sum\limits_{t}{= {\sum\limits_{i = 1}^{r_{{fin}_{d}}}{\frac{1}{\left( {\sigma_{l_{i}} + {\Delta\;\sigma}} \right)}I}}}$from Σ_(l), Δσ and r_(fin) _(d) .

Ket vector |a′_(s)

is multiplied by matrix Σ_(t). The result is multiplied by matrix V. Thelatter multiplication result is the ket vector |y(Ω_(l))

of time-dependent output signals of all loudspeakers.

The inventive processing can be carried out by a single processor orelectronic circuit, or by several processors or electronic circuitsoperating in parallel and/or operating on different parts of theinventive processing.

The invention claimed is:
 1. A method for Higher Order Ambisonics (HOA)encoding comprising: receiving an audio input signal (|χ(Ω_(s))

); determining at least a ket vector (|Y(Ω_(s))

) of spherical harmonics and an encoder mode matrix (Ξ_(o×s)) based ondirection values (Ω_(s)) of sound sources and an Ambisonics order(N_(s)) of the audio input signal (|χ(Ω_(s))

); determining two encoder unitary matrices (U_(s), V_(s) ^(†)) and anencoder diagonal matrix (Σ_(s)) containing singular values and a relatedencoder mode matrix rank (r_(s)) based on a Singular Value Decompositionof the encoder mode matrix (Ξ_(o×s)); determining a threshold value(σ_(ε)) based on the audio input signal (|χ(Ω_(s))

), the singular values of the encoder diagonal matrix (Σ_(s)) and theencoder mode matrix rank (r_(s)); determining a final encoder modematrix rank (r_(fin) _(e) ) based on a comparison of at least one(σ_(r)) of the singular values with the threshold value (σ_(ε)).
 2. Themethod of claim 1, wherein the ket vectors (|Y(Ω_(s))

)of spherical harmonics and the encoder mode matrix (Ξ_(o×s)) are basedon a panning function (f_(s)) that includes a linear operation and amapping of source positions in the audio input signal (|χ(Ω_(s))

) to positions of the loudspeakers in the ket vector (|y(Ω_(l))

)of loudspeaker output signals.
 3. An apparatus for Higher OrderAmbisonics (HOA) encoding comprising: a receiver for receiving an audioinput signal (|χ(Ω_(s))

); a processor configured to determine at least a ket vector (|Y(Ω_(s))

)of spherical harmonics and an encoder mode matrix (Ξ_(o×s)) based ondirection values (Ω_(s)) of sound sources and an Ambisonics order(N_(s)) of the audio input signal (|χ(Ω_(s))

), the processor further configured to determine two encoder unitarymatrices (U_(s), V_(s) ^(†)) and an encoder diagonal matrix (Σ_(s))containing singular values and a related encoder mode matrix rank(r_(s)) based on a Singular Value Decomposition of the encoder modematrix (Ξ_(o×s)); wherein the processor is further configured todetermine a threshold value (σ_(ε)) based on the audio input signal(|χ(Ω_(s))

), the singular values of the encoder diagonal matrix (Σ_(s)) and theencoder mode matrix rank (r_(s)); wherein the processor is furtherconfigured to determine a final encoder mode matrix rank (r_(fin) _(e) )based on a comparison of at least one (σ_(r)) of the singular valueswith the threshold value (σ_(ε)).
 4. The apparatus of claim 3, whereinthe ket vectors (|Y(Ω_(s))

) of spherical harmonics and the encoder mode matrix (Ξ_(o×s)) are basedon a panning function (f_(s)) that includes a linear operation and amapping of source positions in the audio input signal (|χ(Ω_(s))

) to positions of the loudspeakers in the ket vector (|y(Ω_(l))

) of loudspeaker output signals.
 5. A method for Higher Order Ambisonics(HOA) decoding comprising: receiving information regarding directionvalues (Ω_(l)) of loudspeakers and a decoder Ambisonics order (N₁);determining ket vectors (|Y(Ω_(l))

) of spherical harmonics for loudspeakers located at directionscorresponding to the direction values (σ_(l)) and a decoder mode matrix(Ψ_(o×L)) based on the direction values (σ_(l)) of loudspeakers and thedecoder Ambisonics order (N_(l)); determining two corresponding decoderunitary matrices (U_(l) ^(†), V_(l)) and a decoder diagonal matrix(Σ_(l)) containing singular values and a final rank (r_(fin) _(d) ) ofthe decoder mode matrix (Ψ_(o×L)) based on a Singular ValueDecomposition of the decoder mode matrix (Ψ_(o×L)); determining a finalmode matrix rank (r_(fin)) based on the final encoder mode matrix rank(r_(fin) _(e) ) and the final decoder mode matrix rank (r_(fin) _(d) );determining an adjoint pseudo inverse (Ξ⁺)^(†) of the encoder modematrix (Ξ_(o×s)), resulting in an Ambisonics ket vector (|a′_(s)

), based on the encoder unitary matrices (U_(s), V_(s) ^(†)), theencoder diagonal matrix (Σ_(s)) and the final mode matrix rank(r_(fin)); determining an adapted Ambisonics ket vector (|a′_(l)

) based on a reduction of a number of components of the Ambisonics ketvector (|a′_(s)

) according to the final mode matrix rank (r_(fin)); determining anadjoint decoder mode matrix (Ψ)^(†), resulting in a ket vector(|y(Ω_(l))

) of output signals for all loudspeakers, based on the adaptedAmbisonics ket vector (|a′_(l)

), the decoder unitary matrices (U_(l) ^(†), V_(l)), the decoderdiagonal matrix (Σ_(l)) and the final mode matrix rank.
 6. The method ofclaim 5, wherein the ket vectors (|Y(Ω_(l))

) of the spherical harmonics for the loudspeakers and the decoder modematrix (Ψ_(o×L)) are based on a corresponding panning function (f_(l))that includes a linear operation and a mapping of the source positionsin the audio input signal (|χ(Ω_(s))

) to positions of the loudspeakers in the ket vector (|y(Ω_(l))

) of loudspeaker output signals.
 7. The method of claim 5, wherein apreliminary adapted ket vector of time-dependent output signals of allloudspeakers is determined after determining the adjoint decoder modematrix (Ψ)^(†), and wherein the preliminary adapted ket vector oftime-dependent output signals of all loudspeakers is determined based ona panning matrix (G), resulting in the ket vector (|y(Ω_(l))

) of output signals for all loudspeakers.
 8. The method of one of claim7, wherein, the threshold value (σ_(ε)) is based on, within the singularvalues (σ_(i)), an amount value gap that is detected starting from afirst singular value (σ₁), and if an amount value of a followingsingular value (σ_(i+1)) is smaller than an amount value of a currentsingular value (σ_(i)), the amount value of that current singular valueis taken as the threshold value (σ_(ε)).
 9. The method of claim 5,wherein the threshold value (σ_(ε)) is based on a signal-to-noise ratioSNR for a block of samples for all source signals and the thresholdvalue (σ_(ε)) is set to $\sigma_{ɛ} = {\frac{1}{\sqrt{S\; N\; R}}.}$ 10.An apparatus for Higher Order Ambisonics (HOA) decoding comprising: areceiver for receiving information regarding direction values (Ω_(l)) ofloudspeakers and a decoder Ambisonics order (N_(l)); a processorconfigured to determine ket vectors (|Y(Ω_(l))

) of spherical harmonics for loudspeakers located at directionscorresponding to the direction values (Ω_(l)) and a decoder mode matrix(Ψ_(o×L)) based on the direction values (Ω_(l))of loudspeakers and thedecoder Ambisonics order (N₁) and to determine two corresponding decoderunitary matrices (U_(l) ^(†), V_(l)) and a decoder diagonal matrix(Σ_(l)) containing singular values and a final rank (r_(fin) _(d) ) ofthe decoder mode matrix (Ψ_(o×L)) based on a Singular ValueDecomposition of the decoder mode matrix (Ψ_(o×L)); wherein theprocessor is further configured to determine a final mode matrix rank(r_(fin)) based on the final encoder mode matrix rank (r_(fin) _(e) )and the final decoder mode matrix rank (r_(fin) _(d) ); wherein theprocessor is further configured to determine an adjoint pseudo inverse(Ξ⁺)^(†) of the encoder mode matrix (Ξ_(o×s)), resulting in anAmbisonics ket vector (|a′_(s)

), based on the encoder unitary matrices (U_(s), V_(s) ^(†)), theencoder diagonal matrix (Σ_(s)) and the final mode matrix rank(r_(fin)); wherein the processor is further configured to determine anadapted Ambisonics ket vector (|a′_(l)

) based on a reduction of a number of components of the Ambisonics ketvector (|a′_(s)

) according to the final mode matrix rank (r_(fin)); wherein theprocessor is further configured to determine an adjoint decoder modematrix (Ψ)^(†), resulting in a ket vector (|y(Ω_(l))

) of output signals for all loudspeakers, based on the adaptedAmbisonics ket vector (|a′_(l)

), the decoder unitary matrices (U_(l) ^(†), V_(l)), the decoderdiagonal matrix (Σ_(l)) and the final mode matrix rank.
 11. Theapparatus of claim 10, wherein the ket vectors (|Y(Ω_(l))

)of the spherical harmonics for the loudspeakers and the decoder modematrix (Ψ_(o×L)) are based on a corresponding panning function (f_(l))that includes a linear operation and a mapping of the source positionsin the audio input signal (|χ(Ω_(s))

) to positions of the loudspeakers in the ket vector (|y(Ω_(l))

) of loudspeaker output signals.
 12. The apparatus of claim 10, whereina preliminary adapted ket vector of time-dependent output signals of allloudspeakers is determined after determining the adjoint decoder modematrix (Ψ)^(†), and wherein the preliminary adapted ket vector oftime-dependent output signals of all loudspeakers is determined based ona panning matrix (G), resulting in the ket vector (|y(Ω_(l))

) of output signals for all loudspeakers.
 13. The apparatus of claim 10,wherein, the threshold value (σ_(ε)) is based on, within the singularvalues (σ_(i)), an amount value gap that is detected starting from afirst singular value (σ₁), and if an amount value of a followingsingular value (σ_(i+1)) is smaller than an amount value of a currentsingular value (σ_(i)), the amount value of that current singular valueis taken as the threshold value (σ_(ε)).
 14. The apparatus of claim 10,wherein the threshold value (σ_(ε)) is based on a signal-to-noise ratioSNR for a block of samples for all source signals and the thresholdvalue (σ_(ε)) is set to $\sigma_{ɛ} = {\frac{1}{\sqrt{S\; N\; R}}.}$